IAES International Journal of Robotics and Automation (IJRA) 
Vol. 11, No. 1, March 2022, pp. 33~42 
ISSN: 2722-2586, DOI: 10.11591/ijra.v11i1.pp33-42 og 33 


Person following control for a mobile robot based on color 
invariance corresponding to varying illumination 


Shinsuke Oh-hara!, Kaoru Saito”, Atsushi Fujimori! 
'Department of Mechanical Engineering, Faculty of Engineering, University of Yamanashi, Kofu, Japan 
*Mechanical Co. Ltd., Yamanashi, Japan 


Article Info ABSTRACT 

Article history: In this paper, we present a method of person following control for a mobile 
robot using visual information. Color information is often used for object 

Received Jun 7, 2021 tracking. Color information of objects varies greatly under illumination 

Revised Dec 29, 2021 changing environment. In such conditions, the robot controlled by visual 

Accepted Jan 4, 2022 information may lose sight of a person. In this paper, we consider a robust 


person following method by color invariance and image-based control. Color 
invariance shows robust features of colored objects in terms of changing 
Keywords: illumination conditions. At first, we estimate the lowest positions of both feet 
of a tracked person through particle filters based on color invariances. Then, 
we control the velocity of the robot to track the person by using the image- 
based controller. Experimental results using an actual robot demonstrate the 


Color invariance 
Image recognition 


Mobile robot effectiveness of the proposed method. 

Particle filter 

Person following control This is an open access article under the CC BY-SA license. 
© BY SA 

Corresponding Author: 


Shinsuke Oh-hara 

Department of Mechanical Engineering, Faculty of Engineering, University of Yamanashi 
Kofu, Yamanashi, Japan 

Email: sohhara@yamanashi.ac.jp 


1. INTRODUCTION 

Recent robotics technology is expanding not only in special environments such as manufacturing and 
space but also in various fields such as medical treatment, welfare, and entertainment, because of its remarkable 
progress [1]-[3]. Examples of service robots include carts that automatically follow customers in supermarkets 
and that carry nursing care products for caregivers. Such robots require functions that identify the target person 
and track it automatically. 

In order to identify and automatically track a person, it is important to recognize the person and 
estimate the position and direction, such as the distance between that person and the robot. Some researchers 
are implementing such recognizing and tracking functions by using distance sensors such as a laser range finder 
(LRF) [3], [4], and camera [5]-[7]. Okusako et al. [3] estimated the foot position of a person by comparing the 
data obtained from an LRF with the templates consisting of the arrangement data of the foot prepared 
beforehand. Leigh et al. [4] proposed the method for robust detecting a person’s leg using an LRF in indoor 
and outdoor environments. Camera-based person tracking techniques have been actively carried out in the field 
of image recognition [5]—[7]. Perez et al. [5] made a color model of an object from color information of an 
image captured by a camera and proposed a method for tracking the object based on the model. Rincon et al. 
[6] proposed a method for estimating the position and posture of a person using a particle filter based on image 
information obtained from a camera. Ahmad et al. [7] developed the person tracking algorithm using a 
convolution neural network for overhead view camera. From the developments in image recognition, camera- 
based person following robots have been widely studied in many publications, such as monocular camera [2], 
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[8], [9] stereo camera [10] red green blue, and depth (RGB-D) camera [11]. Moreover, Koide et al. [12] 
developed a person-following robot based on the fusion of camera and LRF. 

Nakano et al. [2] proposed a method for recognizing the both feet of a target person based on image 
information and making it follow the robot. In this method, only the feet need to be recognized, so it is not 
necessary for the camera to be mounted high. Thus, the distance between the tracked person and the robot can 
be reduced, and safety to the user is ensured by lowering the center of gravity of the robot. It has simplicity of 
user and environment learning such as inexpensiveness by introducing a commercial camera and identification 
of a person only by color information of the feet. In this paper, we proposed a method to recognize the person’s 
feet from the image information of a camera. 

Nakano et al. [2] convert from the image position of the recognized feet to the three-dimensional 
position information and utilize the position for the control. Such a control method using three-dimensional 
position obtained from the image information is called the position-based method. The position-based method 
is easy to implement the control law. However, it is known that three-dimensional restoration from camera 
images is vulnerable to model errors in camera parameters and is also susceptible to image quantization. If 
there is such an error in the camera parameters, the position-based method cannot keep the distance to the 
person properly, and there is a possibility that the tracking fails. 

Though person or object recognition based on the color information of the camera image is easy to 
implement, the apparent color on the screen changes when the illumination brightness of the environment 
changes, and a large difference occurs from the original color information of the tracking object. This causes 
the mobile robot to mistakenly lose sight of the person to be tracked. 

In this paper, we consider a person tracking control based on an image-based method and color 
invariance [13]—[15]. The image-based method is one of the visual feedback controls and utilizes the state 
variables on the camera image plane. By directly controlling the features of the target on the image, it becomes 
unnecessary to carry out the three-dimensional restoration, and it has the robustness for the model error of the 
camera parameter [16]—[18]. The color invariance also gives a robust feature quantity against apparent changes 
such as illumination lightness and shadow. In this paper, we extract the features of a person based on the color 
invariance and utilize them for tracking control by a mobile robot. The effectiveness of the proposed method 
is evaluated through indoor experiments. In the experiment, one camera is mounted on a mobile robot to reflect 
the feet of the tracked person, and both feet are estimated from two independent particle filters from the camera 
image. Experimental evaluations are carried out by the mobile robot in an indoor environment. We demonstrate 
the effectiveness of the proposed method by measuring traceable illuminance values using an illuminance 
sensor. 


2. IMAGE-BASED CONTROL WITH MOBILE ROBOT 
This paper presents a model of a two wheel-type mobile robot equipped with a camera and realizes 
tracking control by image-based method for the robot based on [17]. 


2.1. Modeling for mobile robots with camera 

Figure 1 shows the model of the mobile robot. In this model, it is assumed that the camera is mounted 
horizontally on the center of the left and right wheels toward the traveling direction of the mobile robot. v 
denotes the translational speed of the mobile robot and @ denotes the rotational speed. Also, we assume that 
the camera coordinate system O’ — X'Y'Z' with the center of the wheel as the origin. Here, if the camera is 
mounted horizontally at the center of the wheel, the camera must be placed in a high position in order to view 
the feet of the person. Therefore, in this paper, we consider that the actual camera is attached to the front of the 
robot and tilted downward as shown in Figure 2. This camera-coordinate system is assumed to be O — XYZ. 
The image coordinate system for the camera coordinate system O' — X’Y'Z' and O — XYZ is defined as O'; — 
x'y'and O; — xy, respectively. The origin of each image coordinate system shall be the image center. The 
coordinate system O — XYZ of the camera tilted downward can be converted to the camera coordinate system 
O' — X'Y'Z’ on the center of the wheel by applying a translation vector and a rotation matrix. For the sake of 
simplicity, consider the kinematic model in the horizontal camera coordinate system O' — X'Y'Z'. Target point 
P is assumed to be always on the ground. 
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Figure 1. Mobile robot model 


Figure 2. A mobile robot equipped with a camera 
and image planes 


The kinematic model of point P = [X’, H, Z']’ with respect to the mobile robot is expressed by (1). 


à Xx’ x’ 
“|H|=-V-ox]H () 
Z' Z' 


Where V = [0 0 v]? andQ=[0 —wœ OJ’. In addition, since P is on the ground, (1) can be expressed as 


etela l Q) 


In the image-based method, the control is performed using the coordinates on the image. Next, we 
transform P in the coordinate system O’ — X'Y'Z' to the target point on the image plane. The vector [x’ y] of 
P on the image plane are obtained by the following. 


[= “la (3) 


Where fis the focal length of the camera. 
Substituting the time-differentiated (3) into (2), the motion of the target vector on the image plane can 
be expressed by (4). 


$=Ju (4) 


Where, 


xy  f2 +x"? 
x’ fH f v 
ee ee 
y jy? xy! a 
fH fH 
The vector s is the controlled variable. In this paper, we refer to the variable s as the image feature 


point. The matrix J containing the components of s is called the image Jacobian matrix and is a regular matrix 
unless y’ is 0. 


2.2. Image-based tracking control 

In this paper, we control the mobile robot so that the image feature point on the image plane coincides 
with the target point. Let s* be the target value for the tilted camera image coordinates system O; — xy. The 
control error e can be expressed as (5). 


e = s-—s* (5) 
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Then, let e’ be the e error converted to the horizontal camera coordinate system O'; — x"y'. When e'is 
differentiated with respect to time, we can obtain (6). 


e’'=Ju (6) 


The matrix J is a regular matrix as long as y’ # 0. That is, if P is not at the same height as the camera, 
J is regular. In our problem setting, if the camera tilted down can recognize the target P in the image, J is 
always regular and the inverse matrix exists. Therefore, the following control law is used for the speed control 
of the mobile robot. 


u=—AJ-te' = le ane (7) 


Where À, and Àw are positive feedback gains related to speed and angular speed, respectively. From the above, 
the closed-loop system of the mobile robot is expressed as follows: 


e = —Ae’ (8) 


It can be seen that it can be stabilized by the control law of (7). In this paper, we detect the lower end 
positions of both feet of a person from the image of the camera and then determine the average of the two 
positions as the image feature point s. The image Jacobian matrix J is calculated from the point s in real-time, 
and the person tracking is realized by the control law of. (7). 


3. FEET RECOGNITION AND TRACKING OF PERSON BASED ON COLOR INVARIANCE 
3.1. Color invariance 

The color invariance [13] is robust to apparent changes such as shadows and illumination lightness. 
By deriving spectral derivatives from the physical model of the reflected light spectrum of Kulbelka-Munk, 
two invariances, H and C, are calculated. In addition, the invariances H and C are calculated from the linear 
transformations of RGB color information of the images and the differential coefficients of the reflected spectra 
as (9) and (10). 


_ 0.30R+0.04G-0.35B 
~~ 0.34R—-0.60G+0.17B 


(9) 


_ 0.30R+0.04G-0.35B 
~~ 0.06R+0.63G-+0.27B 


(10) 


The invariance H is the amount due solely to the color of the object's surface, and C is the amount 
with respect to non-glossy objects. If the values of H and C are constant, they represent a plane of RGB space. 
Since the values of H and C can be infinite, we use the following inverse trigonometric functions for H and C 
to finite them. 

Oy = tan +H 
11 
P = tan! C (11) 


Figure 3 shows the image obtained by converting the original one when the light is turned off with H 
and C and displaying it on a grayscale. Figure. 3(a) shows the image captured in an illuminated room. We can 
clearly recognize the color and the pattern on the cover of the object. On the other hand, Figure. 3(b) shows 
the image taken in the room with lights off. It is difficult to recognize the color and the pattern from the image. 
The image converted by H in Figure. 3(c) can be seen that the characteristics of the object become clearer. On 
the other hand, C shown in Figure. 3(d) shows a small change in the whole image, which obscures the 
characteristics of the object from the image converted by H. This suggests that the use of C together may reduce 
the recognition of the target, so it is effective to use H proactively. However, there is a problem that H cannot 
distinguish its opposite color for a given color, and Kobayashi et al. [14] have made the opposite color 
correspond to a different value by dividing the value of H based on C. In this paper, the following invariance 
H' is obtained by using the invariant C) = C ~ —0.01 as the standard in the same way as [14]. 
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(d) 


Figure 3. Examples of transferred images using color invariance are (a) original image with light, (b) original 
image without light, (c) color invariance H, (d) color invariance C 


= (Op — 1) (0; 2 Oc Oy = 0) 


1 
= 84 (0; 2 cq. On < 0) 

On=4 2 (12) 
=O (0s < Oc, On = 0) 


= (Oy +1) (Gc < Oc, On < 0) 


In this paper, the invariance H’ of (12) is utilized as a feature quantity of the foot of a tracking person. 
There is a hue as an index to express the change of color, and it shows a robust property for illumination 
lightness and so on. Using this property, Ri and Fujimoto [19] proposed a person tracking method that fuses 
hue and location. However, a comparative study by Matsumoto et al. [15] shows that the invariance H’ is more 
robust to illumination changes than the hue. Therefore, in this paper, we use the color invariance H’ for person 
tracking. 


3.2. Person tracking based on particle filter 

Figure 4 shows a block diagram of person tracking in this paper. In this paper, based on the method 
of [2] we implement independent two-particle filters on both feet of a tracked person. The following shows the 
procedure of the person tracking by using a camera. 


Determine tracked 
person 


Learning feature of 


tracked person 


Search bounding line 


E 3 Particle filter update 
of right and left feet ~ re 


Person following 
control 


Figure 4. Block diagram of the proposed method 


Detection of lowest 
position of both feet 
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3.2.1. Determination of a tracked person 
First, a mobile robot is kept on standby, and when a person intrudes in front of the camera, a tracking 
person is decided by background difference. 


3.2.2. Learning feature of the tracked person 

The color information obtained from the image of the tracked person is converted into invariances H 
and C, and the invariance H’ is calculated based on (12). We also create and normalize a histogram of the 
invariance H’ to take advantage of this for the likelihood of particle filters. Along with this, the particle filters 
are initialized. 


3.2.3. Search bounding line for both feet 
Based on the method of [2] the likelihood of the particles of each particle filter is projected and 
accumulated in the x-axis direction as (13). 


Nn) 1 (x-xf)" Ny) 1 (x-0) 
Ace(x) = Elw exp | -9 | + ow eap -E (13) 


Where N is the number of the particle, x is the state and w™ is the corresponding likelihood of the n-th 
particle of each particle filter, respectively. The subscripts / and r denote for the left and right foot, respectively, 
First, we find the two centers of gravity of all particles for each particle filter. Then, the x-value where Acc(x) 
is the minimum between the x-values of each center of gravity is used as the dividing line. 


3.2.4. Searching for the lowest points of both feet 

We determine the lowest positions of both feet on an image by utilizing both foot dividing lines and 
the likelihood of particles. The x-coordinate of the center of gravity of the particles for each filter is used as the 
x-coordinate of the lowest position. In this system, the likelihood of a particle with a negative y-coordinate 
trend is high because the foot is projected above the image plane. Therefore, we examine the likelihood of each 
particle from the upper part of the image plane, if a particle is in Table the set threshold, we regard the y- 
coordinate of the particle as the lowest point of the foot. Figure 5 shows the result of searching the lowest point 
of both feet obtained by this method. The green dots are the bottom point of both feet, the bars on both sides 
of the image indicate the magnitude of the likelihood in the y-axis direction of each particle, and the bars on 
the bottom indicate the magnitude of the likelihood in the x-axis direction. Blue is the right foot, and red is the 
likelihood of the left foot. 


Figure 5. Example of the feature points extracted by the proposed method 


3.2.5. Tracking control by an image-based method 

The average value of the obtained lowest positions of both feet is set as the feature s, and s is converted 
into the coordinates of the horizontal camera image coordinate system O' — X'Y'Z'. The state variable is applied 
to the control law in (7). 


3.2.6. Particle filter update 


In the particle filter update, the resampling of the particles is performed. The process returns to the 
two-leg dividing line search in the procedure (3.2.3). In this paper, we assume a constant velocity motion model 
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as a model of the particle filter. The mobile robot is controlled to follow the person by repeating these 
procedures. 

The particle number of each foot particle filter in this paper is 1200. The number of bins in the 
histogram for the invariance H' is 1000, and the threshold of the likelihood for determining the lowest point is 
set to 10°. The target value for person tracking control is set at the center of the tilted camera image, that is, 
the origin. The gain À, and A in the controller of the (7) are set to 0.6 and 1.8, respectively. 

In the tracking procedure (3.2.4), Nakano et al. [2] extracted the lower edges from the position of the 
center of gravity of the particles, then determined the lowest points by edge detection. However, in this method, 
the foot image is blurred due to the fast movement of the foot and the change of the illumination brightness, 
and it becomes difficult to find the lowest point by the edge detection. Therefore, the point is extracted at a 
position different from the actual foot, and it becomes impossible to keep the distance between the person and 
the robot properly. On the other hand, the likelihood-based method in this paper makes it possible to find the 
endpoints even for blurred images. 


4. EXPERIMENTAL EVALUATIONS 

Pioneer 3DX [20] was fitted with a universal serial bus (USB) ELECOM Corporation Universidad 
Católica San Antonio de Murcia (UCAM)-C 0220 to reflect the feet of a person, see Figure 6. Dell latitude 
3550 (4 GB) laptops and Microsoft visual C++ 2010 express were used to implement the proposed method, 
and OpenCV-2.1 was used to process images. The proposed particle filters were implemented based on the 
condensation algorithm [21] in OpenCV. The illuminance sensor Graphtec GS-LXUV was mounted to measure 
the ambient illumination during experiments. 


Figure 6. Mobile robot pioneer 3DX 


The experiments were carried out in an indoor environment. The person went straight to the hallway 
where the lights were turned off when the person started walking. Figure 7 shows an example of the results of 
measurement using an illuminance meter in an experiment. The experimental environment changes from a 
maximum illuminance of 27.4 [lx] to a minimum illuminance of 1.8 [Ix]. 


Illumination [b] 
~ 


0 2 a 6 8 10 12 14 


Walk distance [m] 
Figure 7. Illumination condition in the experimental environment 
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The proposed method is compared with the RGB color information method for tracking a person's 
feet. The method of Nakano et al. [2] is based on RGB color information. We show the process of tracking a 
person by using the onboard camera when using the RGB color-based method and the proposed method, 
respectively. The lightness of each image is processed to make it easier to see the experiment. Note that the 
actual images are darker than them. 

Figure 8 shows the RGB color-based method in which particles are dispersed in the shadows of the 
feet as the illumination intensity decreases. In Figure 8(a), the robot specified the target person based on the 
background difference and determined the lowest positions. In Figures 8(b) and (c), the robot recognized the 
feet of the person and followed it since the lighting in the hallway was bright. The particles began to disperse 
even toward the shadow of the foot from Figure 8(d), and the distance between the person and the robot began 
to increase. Because the color of the person's pants was dark blue, the particles in the shaded area were given 
likelihood. In Figure 8(e), the shadows of both feet were extracted as the lower endpoints, even though the feet 
were captured by the camera. The robot lost sight of the feet and could not track the person in Figure 8(f). The 
illuminance in Figure 8(f) was 7.2 [lx], where the foot could not be recognized and tracking became impossible 
from the images shown in Figure 8(f) and later. 

In the proposed method shown in Figure 9, the particles did not disperse in the shadows of the feet 
even in situations where both feet were difficult to recognize due to the darkness. Figure 9 shows that the color 
invariance is robust against shadows. In Figure 9(a), the robot specified the target person by the background 
difference and determined the lowest positions. In Figure 9(b), it was equivalent to the brightness when tracking 
fails using a color information-based method. In the proposed method, a group of highly likelihood particles 
was distributed around both feet, and the lower endpoints of both feet were accurately detected. From Figures 
9(c), (d), and (e), the robot recognized both feet and tracked the person, even though the person was walking 
into the darker corridor. In Figure 9(f), the lower endpoints of both feet were correctly extracted in the darkest 
place. The illuminance at Figure 9(f) was 2.0 [Ix], which was in a darker environment than in the case of color 
information. 

In this experiment, a color information-based method and the proposed method were performed 50 
times each, and the results were measured by an illuminance meter. Table 1 shows the mean and standard 
deviation of minimum illumination where the robot could follow the person. 


(b) (c) 
(d) (e) (f) 


Figure 8. Scene of tracking based on RGB color information are (a) T=0.0 [s], (b) T=6.8 [s], 
(c) T=12.2 [s], (d) T=15.8 [s], (e) T=23.5 [s], (f) T=28.7 [s] 
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(d) (e) (f) 


Figure 9. Scene of tracking based on color invariance are (a) T=0.0 [s], (b) T=10.3 [s], (c) T=40.9[s], 
(d) T=47.5 [s], (e) T=54.7 [s], (f) T=58.3 [s] 


Table 1. Minimum illumination 
Minimum illumination [lx] 
RGB color space 7.2 +0.4 
Color invariance 2.0 0.1 


5. CONCLUSION 

In this paper, we consider the recognition of the person's feet and tracking control by a camera 
mounted on a mobile robot under illumination variation. In this paper, we proposed an image-based tracking 
control method, which is robust visual feedback control. Furthermore, in order to correspond to the change of 
the lighting environment, the color information of the image is converted to the color invariant, and the person 
tracking method using it is proposed. The effectiveness of the proposed method was evaluated by experiments 
with an actual mobile robot. The measurement of the illuminance value by the illuminance sensor was also 
carried out in the experiment, and it was shown that the person could be tracked even in the environment in 
which the lighting brightness was low by the proposed technique. 
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